Search CORE

7 research outputs found

Towards the Automatic Classification of Documents in User-generated Classifications

Author: Morshed Ahsan-Ul
Publication venue
Publication date: 01/01/2006
Field of study

There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

Unitn-eprints Research

Creating and Aligning Controlled Vocabularies

Author: Morshed Ahsan-ul
Sini Margherita
Publication venue
Publication date: 01/08/2009
Field of study

E-LIS

Unitn-eprints Research

Aligning controlled vocabularies using faceted based approach

Author: Keizer Johannes
Morshed Ahsan-ul
Sini Margherita
Publication venue
Publication date: 23/11/2009
Field of study

A vocabulary stores words, synonyms, word sense definitions (i.e. glosses), relations between word senses and concepts; such a vocabulary is generally referred to as the Controlled Vocabulary (CV) if choice or selections of terms are done by domain specialists. A facet is a distinct and dimensional feature of a concept or a term that allows taxonomy, ontology or controlled vocabulary to be viewed or ordered in multiple ways, rather than in a single way. The facet is also clearly de¯ned, mutually exclusive, and composed by collectively exhaustive aspects, properties or characteristics of a domain. For example, a collection of rice might be represented using a name facet, place facet etc. In our case, we build a facet for each concept considering more general concepts (broader terms), less general concepts (narrow terms) or related concepts (related terms) that is to be called concept facet (CF). We use these CF's for mapping two controlled vocabularies. This methodology is based on hidden semantic matching which is diferent from the orthodox view of matching

E-LIS

Classifications

Author: Fausto Giunchiglia
Md. Ahsan-ul Morshed
Md. Ahsan-ul Morshed
Supervisor Prof
Publication venue
Publication date
Field of study

Abstract. There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

CiteSeerX